Method of detecting anomalies in a communication system using numerical packet features

ABSTRACT

A method of detecting anomalies in a communication system, includes: providing a first packet flow portion and a second packet flow portion; extracting samples of a numerical feature associated with a traffic status of the first and second packet flow portions; computing from said extracted samples a first statistical dispersion quantity and a second statistical dispersion quantity of the numerical feature associated with the first and second packet flow portions, respectively; computing from the dispersion quantities a variation quantity representing a dispersion change from the first packet flow portion to the second packet flow portion; comparing the variation quantity with a comparison value; and detecting an anomaly in the system in response to said comparison.

BACKGROUND

1. Technical Field

The present invention relates to anomaly detection on packet switchedcommunication systems. Particularly, the present invention is related tostatistical methods for detecting network traffic anomalies due tonetwork attacks or to communication system failures.

2. Description of the Related Art

Several types of attacks are known, such as: (distributed) denial ofservice ((D)DoS) attacks, scanning attacks, SPAM or SPIT attacks, andmalicious software attacks.

Denial-of-Service (DoS) attacks and, in particular, distributed DoS(DDoS) attacks are commonly regarded as a major threat to the Internet.A DoS attack is an attack on a computer system network that causes aloss of service or network connectivity to legitimate users, that is,unavailability of services. Most common DoS attacks aim at exhaustingthe computational resources, such as connection bandwidth, memory space,or CPU time, for example, by flooding a target network node by valid orinvalid requests and/or messages. They can also cause disruption ofnetwork components or disruption of configuration information, such asrouting information, or can aim at disabling an application making itunusable. In particular, the network components (e.g., servers, proxies,gateways, routers, switches, hubs, etc.) may be disrupted by malicioussoftware attacks, for example, by exploiting buffer overflows orvulnerabilities of the underlying operating system or firmware.

A DDoS attack is a DoS attack that, instead of using a single computeras a base of attack, uses multiple compromised computers simultaneously,possibly a large or a very large number of them (e.g., millions), thusamplifying the effect. Altogether, they flood the network with anoverwhelming number of packets which exhaust the network or applicationresources. In particular, the packets may be targeting one particularnetwork node causing it to crash, reboot, or exhaust the computationalresources. The compromised computers, which are called zombies, aretypically infected by malicious software (worm, virus, or Trojan) in apreliminary stage of the attack, which involves scanning a large numberof computers searching for those vulnerable. The attack itself is thenlaunched at a later time, either automatically or by a direct action ofthe attacker.

(D)DoS attacks are especially dangerous for Voice over IP (VoIP)applications, e.g., based on the Session Initiation Protocol (SIP). Inparticular, the underlying SIP network dealing only with SIP signallingpackets is potentially vulnerable to request or message floodingattacks, spoofed SIP messages, malformed SIP messages, and reflectionDDoS attacks. Reflection DDoS attacks work by generating fake SIPrequests, as an example, with a spoofed (i.e. simulated) source IPaddress which falsely identify a victim node as the sender, and bysending or multicasting said SIP requests to a large number of SIPnetwork nodes, which all respond to the victim node, and repeatedly soif they do not get a reply, hence achieving an amplification effect.

SPAM attacks consist in sending unsolicited electronic messages (e.g.,through E-mail over the Internet), with commercial or other content, tonumerous indiscriminate recipients. Analogously, SPIT (SPam overInternet Telephony) attacks consist in sending SPAM voice messages inVOID networks. Malicious software attacks consist in sending malicioussoftware, such as viruses, worms, Trojan, or spyware, to numerousindiscriminate recipients, frequently in a covert manner. Scanning orprobing attacks over the Internet consist in sending request messages inlarge quantities to numerous indiscriminate recipients and to collectthe information from the provoked response messages, particularly, inorder to detect vulnerabilities to be used in subsequent attacks. Forexample, in port scanning attacks, the collected information consists ofthe port numbers used by the recipients.

Attack detection techniques are known which utilize a description(signature) of a particular attack (e.g., a virus, worm, or othermalicious software) and decide if the observed traffic data isconsistent with this description or not; the attack is declared in thecase of detected consistency.

Furthermore, anomaly detection techniques are known which utilize adescription (profile) of normal/standard traffic, rather than anomalousattack traffic, and decide if the observed traffic data is consistentwith this description or not; an attack or anomalous traffic is declaredin the case of detected inconsistency.

Unlike attack detection techniques, anomaly detection techniques do notrequire prior knowledge of particular attacks and as such are inprinciple capable of detecting previously unknown attacks. However, theytypically have non-zero false-negative rates, in a sense that they canmiss to declare an existing attack. They also typically have higherfalse-positive rates, in a sense that they can declare anomalous trafficin the case of absence of attacks.

Anomaly detection techniques can essentially be classified into twocategories: rule-based techniques and statistic-based or statisticaltechniques. Rule-based techniques describe the normal behavior in termsof certain static rules or certain logic and can essentially bestateless or stateful. In particular, such rules can be derived fromprotocol specifications.

On the other hand, statistical anomaly detection techniques describe thenormal behavior in terms of the probability distributions of certainvariables, called statistics, depending on the chosen data features orparameters.

Paper “DDoS detection and wavelets”, L. Li and G. Lee, TelecommunicationSystems-Modeling, Analysis, Design and Management, vol. 28, no. 3-4, pp.435-451, 2005, discloses a method comprising the step of dynamicallyapplying a discrete wavelet transform to overlapping sliding windows ofthe byte rate curves in time and looking for sudden changes in thelogarithms of the associated energy distribution coefficients in orderto detect DDoS attacks.

US-A-2004-0220984 describes a method wherein the packet and byte ratesare considered as functions of time and, at each time, the mean valuesand variances of these rates are estimated by using historical data,possibly as Exponentially Weighted Moving Averages (EWMAs), and then agiven sample of traffic at a given time is classified by comparing itspacket and byte rates with a threshold being proportional to the sum, atthe given time, of the historical mean value and the historical standarddeviation (i.e., the square root of the variance) multiplied by apositive constant. Anomalous traffic is declared if the threshold isexceeded, i.e., if the observed sample of traffic is classified as anoutlier.

U.S. Pat. No. 6,601,014 B1 discloses a method where the mean value andthe variance are estimated as the EWMAs, with different, but mutuallyrelated associated constants.

Article “EWMA techniques for computer intrusion detection throughanomalous changes in event intensity”, N. Ye, C. Borror, and Y. Zhang,Qual. Reliab. Engng. Int., vol. 18, pp. 443-451, 2002, describes amethod wherein EWMA techniques are applied for dynamically estimatingthe mean values and variances of the event intensity process derivedfrom the audit trail data describing the activities on a host machine ina computer network. Anomaly detection is based on the outlierclassification principle, where the thresholds are determined undercertain probabilistic models for the event intensity process.Alternatively, anomaly detection is based on the estimated varianceonly, which is compared with a reference value and an alert is thendeclared if the ratio of the two values is too large or too small.

Paper “Statistical traffic identification method based on flow-levelbehavior for fair VoIP service”, T. Okabe, T. Kitamura, and T. Shizuno,Proceedings of the 1st IEEE Workshop on VOID Management and Security,Vancouver, Canada, April 2006, pp. 33-38, describes a flowidentification method, for VOID media traffic, using the flow statisticssuch as the minimal and maximal values of the packet inter-arrival timeand some characteristics of the packet size distribution comprising theminimal, maximal, average, and median values as well as the total numberof different packet sizes occurring in a flow. The statistics arecalculated and compared with reference patterns on short time intervals(e.g., 1 second long) and the verification results are averaged over alonger time interval in order to classify a given flow.

Article “Load characterization and anomaly detection for voice over IPtraffic”, M. Mandjes, I. Saniee, and A. L. Stolyar, IEEE Transactions onNeural Networks, vol. 16, no. 5, pp. 1019-1026, September 2005,describes a method relating to VOID data traffic that consists incomputing the empirical variance estimates of the normalized byte rateon overlapping windows and comparing them with predicted variances thatare theoretically obtained under probabilistic models for the number ofcalls per second. At any time, an anomaly is declared if the ratio ofthe empirical and theoretical variances is greater than a threshold,which falls in the range between one and two.

BRIEF SUMMARY

The Applicant has observed that the known solutions are not satisfactorywith respect to the achieved false-negative and false-positive rates,computational complexity and memory requirements. This could be due tothe fact that it is difficult for the normal traffic in communicationsnetworks to be described by stable probability distributions. Moreover,it is difficult to define statistical models describing thecommunication networks that would give rise to sufficiently lowfalse-positive and false-negative rates. It should be also noticed thatthe complexity of the statistical methods of the prior art techniquesmay be unacceptably high for high-speed and high-volume communicationsnetworks.

The Applicant has noticed that there is a need in the field forachieving an anomaly detection method providing increased reliabilityand, preferably, reduced computational complexity and memoryrequirements. In accordance with a particular embodiment, the Applicanthas observed that advantages can be obtained by monitoring thestatistical behavior of numerical packet features associated with twopacket flow portions lying in corresponding time windows that are movingin time. A numerical packet feature (as an example, the byte rate) isany quantity extracted from network packets that can be expressed asnumerical data by a real, rational, or integer number in such a way thatthe feature values can be regarded as mutually close if the differenceof the corresponding numbers is relatively small in absolute value.

An object of the present invention is a method of detecting anomalies asdefined by the appended independent claim 1. Preferred embodiments ofthis method are defined in the dependent claims 2-21. According toanother aspect, the present invention also relates to an apparatus ofdetecting anomalies in a packet switched communication system, such asdefined in claim 22 and a preferred embodiment thereof defined in thedependent claim 23. A further object of the present invention is apacket switched communication system as defined by claim 24. Inaccordance with another aspect, the invention relates to a computerprogram product as defined by claim 25.

BRIEF DESCRIPTION OF THE DRAWINGS

The characteristics and the advantages of the present invention will bebetter understood from the following detailed description of embodimentsthereof, which is given by way of illustrative and non-limiting examplewith reference to the annexed drawings, in which:

FIG. 1 schematically shows an example of a packet switched communicationsystem comprising a detection apparatus, in accordance with theinvention;

FIG. 2 illustrates by a flow chart an example of a method of detectingtraffic anomaly, in accordance with the invention;

FIG. 3 shows a first embodiment of said example of the method ofdetecting traffic anomaly;

FIG. 4 shows a curve representing the behavior of data generated by acomputer simulation;

FIG. 5 shows the detection of anomalous traffic for the data of FIG. 4obtained with a simulation of said first embodiment;

FIG. 6 shows two exemplary time windows defined in accordance with saidfirst embodiment;

FIG. 7 illustrates two exemplary time windows defined in accordance witha second embodiment of said example of the method for detecting trafficanomalies;

FIG. 8 illustrates three exemplary time windows defined in accordancewith a third embodiment of said example of the method for detectingtraffic anomalies;

FIG. 9 shows the detection of anomalous traffic for the data of FIG. 4obtained with a simulation of said third embodiment;

FIG. 10 shows the detection of anomalous traffic for the data of FIG. 4obtained with a simulation of a fourth embodiment of said example of themethod for detecting traffic anomalies;

FIG. 11 illustrates by a flow chart a fifth embodiment of said exampleof the method for detecting traffic anomalies.

DETAILED DESCRIPTION

Hereinafter, a communication system and several embodiments of astatistical anomaly detection method will be described. In particular,the anomalous traffic to be detected can be due to (D)DoS attacks, SPAMand/or SPIT attacks, scanning attacks, as well as malicious softwareattacks. It should be noticed that the teachings of the presentinvention can also be applied to detect anomalous traffic due tofailures in hardware apparatuses or in software modules operating in thecommunication system.

FIG. 1 shows schematically an example of a communication system 100 inwhich a method for detecting anomalous traffic can be implemented. Thecommunication system 100 is a packet switched network and, for example,is a conventional Internet type network employing the IP protocol.Particularly, the communication system 100 can be a SIP (SessionInitiation Protocol) network for transmitting SIP signaling packets suchas, according to an example, packets relating to VoIP (Voice overInternet Protocol) traffic. The communication system 100 comprises afirst end system S1 (e.g., a server apparatus), a second end system C1and a third end-system C2 which are client apparatuses. Moreover, theexemplary communication system 100 is provided with a first router R1, asecond router R2, a third router R3, and a fourth router R4, each ofthem suitable to extract the destination of a received packet, selectthe best path to that destination, and forward packets to the nextdevice along this path. Routers R1-R4 and end systems S1, C1, and C2 areconnected by channels of the guided type (such as fiber optic cables,coaxial cables, or other guided transmission means) or are connectableby radio signals. The teachings of the invention are applicable to acommunication system having a different number of end systems androuters. The other apparatuses normally employed in a packet switchednetwork and, particularly, the Internet, such as modems, switches, hubs,bridges, gateways, repeaters, and multiplexers, are not shown as theyare evident to the skilled person.

The particular communication system 100 illustrated in FIG. 1 includes adetection apparatus 101, such as a processor and, particularly, a hostcomputer, which is suitable to implement the anomaly detection method tobe described hereinafter. According to an example, the detectionapparatus 101 comprises a central processing unit, a work memory, a massmemory (such as a hard disk), user interfaces, as well as input andoutput ports allowing a communication with other devices (particularly,routers R1-R4) of the communication system 100. All these components arenot shown in FIG. 1. As indicated in FIG. 1, the detection apparatus 101also comprises: a data collection module 102, as an example connectedwith at least one of the routers R1-R4, an optional flow aggregationmodule 103, a statistical analysis module 104, and an alarm generationmodule 105. Examples of functions performed by the modules 102-105 willbe described in greater detail later. Modules 102-105 can be hardwareand/or software components and the detection apparatus can be locatedremotely from each of the routers R1-R4 or can be located by one of suchrouters, as an example, integrated in one of the routers R1-R4.According to another embodiment, the functions performed by modules102-105 can be distributed among different suitable devices belonging tothe communication system 100 or connected to such a system.

As known, the Open Systems Interconnection Basic Reference Model (OSIReference Model or OSI Model for short) is a layered, abstractdescription for communications and computer network protocol design. Itis also called the OSI seven layer model since it defines the followinglayers: application (7), presentation (6), session (5), transport (4),network (3), data link (2), and physical (1).

Layers 3 and 4 (the network and transport layers, respectively) includethe following information of an IP packet: source IP address, TCP/UDP(Transmission Control Protocol/User Datagram Protocol) source portnumber, destination IP address, TCP/UDP destination port number, andtransport protocol used (e.g., TCP or UDP). A series of packets havingin common the above listed information is defined as a (network) “flow”.

Example of an Anomaly Detection Method

FIG. 2 shows by means of a flow chart an example of a method 200employable for the detection of anomaly on the communication system 100,in accordance with the invention. As an example, this method can beimplemented by the detection apparatus 101. The method 200 initiallycomprises a symbolic start step 201 (START) and an input step 202(INPUT) wherein a first packet flow portion PFP1 and a second packetflow portion PFP2 are taken from the communication system 100, separatedfrom each other, stored, and then provided as input to the subsequentsteps of the method described. The first (PFP1) and the second (PFP2)packet flow portions are possibly defined (i.e. included) in elementarytime intervals of length/duration ΔT. According to a first embodiment,the two flow portions PFP1 and PFP2 belong to the same network flow, asdefined above.

Subsequently, in an extracting step 203 (EXTRACT), samples (x_(i))_(i)of a numerical feature x, associated to and describing a traffic statusof the first flow portion PFP1 are extracted. Samples (x_(i))₂ of thenumerical feature x of the second packet flow portion PFP2 are alsoextracted. A numerical packet feature is any quantity extracted fromnetwork packets that can be expressed as numerical data by a real,rational, or integer number. According to this definition, it ismeaningful to measure the distance or closeness between two numericalfeature values by the Euclidean metric. Particularly, but notnecessarily, a numerical packet feature may relate to and provide anindication about the traffic volume, i.e., the data amount transportedby a packet or a packet flow portion. The definitions of some specificnumerical packet features which can be extracted from packets are thefollowing:

-   -   “size in bytes” of an IP packet is the total number of layer 3        bytes in a packet;    -   “total number of packets” (N_(packet)) and “total number of        layer 3 bytes” (N_(byte)) in a considered elementary time        interval of length ΔT; these two features are statistics (i.e.,        numerical data) regarding a flow;    -   “average packet size” in an interval of length ΔT, in bytes, is        computed as N_(size)=N_(byte)/N_(packet), provided that        N_(packet)>0;    -   “packet rate” R_(packet)=N_(packet)/ΔT is the number of packets        per second;    -   “byte rate” R_(byte)=N_(byte)/ΔT is the number of bytes per        second.        The average packet size can also be expressed as        N_(size)=R_(byte) R_(packet). The reciprocal of R_(packet) is        the average inter-arrival time between two successive packets in        a flow.

It is observed that the length ΔT essentially specifies the timeresolution with which the traffic is monitored and analyzed and can bestatic or dynamic. The starting and ending times of the first and thelast packet in a flow, respectively, as well as the total number ofmonitored flows can also be extracted in step 203. The basic numericalfeatures described above, which are based on the information containedin layers 3 and 4 of packet headers are already available in commercialproducts used in IP networks such as routers and switches (e.g., thewell-known Netflow data).

Furthermore, it is noticed that the extracting step 203 can be performedby a hardware and/or software extractor module 106 included in each oronly in some of the routers R1-R4 or in other network nodes that arearranged to extract and transmit the extracted numerical features to thedata collection module 102 of the detection apparatus 101 (FIG. 1).

In a computing step 204 (DISPERSION), a first statistical dispersionquantity Dq_(i) of the numerical feature x associated with the firstpacket portion PFP1 is computed on the basis of the correspondingnumerical feature samples (x_(i))₁. Moreover, a second statisticaldispersion quantity Dq₂ of the numerical feature x associated with thesecond packet portion PFP2 is computed on the basis of the correspondingnumerical feature samples (x_(i))₂.

A statistical dispersion quantity of a set of numerical data is ameasure of how the observed numerical values in the data set aredispersed from each other with respect to the Euclidean or other relatedmetrics among real numbers. Particularly, a statistical dispersionquantity is a real number that is equal to zero if all the data valuesare identical, and generally increases as the data values become moredispersed. Examples of statistical dispersion quantity are: the variancedefined as the mean squared deviation of the numerical values from theirarithmetic mean value; the standard deviation defined as the square rootof the variance; the mean absolute deviation of the numerical valuesfrom their arithmetic mean value; the minimum mean squared deviation ofthe numerical values from any affine approximation, where the optimalaffine approximation of numerical data, minimizing this mean squareddeviation, can be determined by linear regression techniques as will beclarified later.

According to a particular embodiment, the first Dq₁ and second Dq₂statistical dispersion quantities are a first variance σ₁ ² and a secondvariance σ₂ ², respectively. The computing 204 of the first and secondstatistical dispersion quantities can be performed, according to theexample, by the statistical analysis module 104.

In a further computing step 205 (VARIATION), a variation quantity Δ iscomputed from the first statistical dispersion quantity (Dq₁) and thesecond statistical dispersion quantity (Dq₂). The variation quantity Δmeasures a statistical variation or change between the first statisticaldispersion quantity (Dq₁) associated with the first packet flow portionPFP1 and the second statistical dispersion quantity (Dq₂) associatedwith the second packet flow portion PFP2. Preferably, the expected valueof the variation quantity Δ should be relatively small if the firstpacket flow portion PFP1 and the second packet flow portion PFP2 areboth drawn from a same probability distribution. Particularly, thevariation quantity can be related to a difference between said firstvariance σ₁ ² and said second variance σ₂ ² in which case it measures achange of variance for the two packet flow portions observed. Thecomputation of the variation quantity Δ can be carried out by thestatistical analysis module 104.

The variation quantity Δ is compared, in a comparison step 206(COMPARE), with comparison value such as a threshold value Thr.According to said comparison step 206, if the threshold value Thr isexceeded, then an anomaly is detected (branch Yes) and an alarm signalALARM is generated in an alarm issuing step 207. If the threshold valueThr is not exceeded, then an anomaly is not detected (branch No) and analarm signal ALARM is not generated. Particularly, the comparison andthe alarm generation step 207 can be performed by the above mentionedalarm generation module 105. The threshold can be static or dynamic andcan be determined on the basis of historical data. In particular, adynamic threshold can change adaptively.

Following a positive (Yes) or negative (No) anomaly detection, thedetection method 200 can be repeated in connection with further packetflow portions. Particularly, the further packet flow portions can lie intime intervals whose end points are delayed with respect to the ones inwhich the first (PFP1) and second (PFP2) packet flow portions wereincluded. Even more particularly, the further packet flow portions canlie in time intervals whose both start and end points are delayed withrespect to the ones in which the first (PFP1) and second (PFP2) packetflow portions were included.

Due to the fact that the computation of the dispersion quantities(particularly, the two variances) is performed in delayed timeintervals, the method 200 is also called “the moving variance method”.It should be noticed that for each monitored flow portion, not only asingle numerical packet feature, but also a plurality of numericalpacket features can be extracted, stored, and analyzed. For example, thefollowing features can be considered: R_(packet), R_(byte), andN_(size). It is observed that any two of the numerical featuresR_(packet), R_(byte), and N_(size) are mutually independent.

Any such feature can be used to detect a respective anomaly. Inparticular, the average packet size N_(size) is preferable for detectinganomalous traffic comprising repeated transmission of essentially thesame or similar packets (e.g., packets with the same payload), becausein this case N_(size) changes its probability distribution over timewith respect to normal traffic, e.g., its variance over time may bereduced. For example, if a message flooding (D)DoS attack is in progresson a SIP network, then it may be likely that a particular type of SIPmessages/packets (e.g., INVITE, RE-INVITE, BYE, or REGISTER) is (much)more frequent than the others.

Moreover, in addition to the average packet size N_(size) also thepacket rate R_(packet) is monitored and involved into the anomalydetection method 200. For most anomalous traffic, such as the request ormessage flooding and reflection DDoS traffic, the traffic volume isincreased and this is reflected in an increased value of R_(packet).However, an increased R_(packet) can also be caused by normal trafficsuch as flash crowds. Also, in case of DDoS attacks, the traffic volumeis high near the target, but may be low near the distributed sources ofthe attack. Therefore, it is preferable to employ not only the packetrate R_(packet) but also at least one additional packet feature.

The detection criteria for a detection method employing a plurality ofnumerical packet features will be described in greater detail later,with reference to a fifth embodiment and to FIG. 11.

The features N_(size) and R_(packet) or other can numerical packetfeatures c be traced in time at a chosen network node (i.e., a router)or a set of nodes, for each individual flow or for certain selectedflows, e.g., according to the packet rate. Alternatively, in accordancewith a particular example, the numerical feature values for individualflows can be aggregated in groups according to selected packetparameters such as the source or destination IP addresses or the sourceor destination port numbers. For example, the flows can be grouped forthe same source IP address or the same destination IP address. In theformer case, the flow statistics correspond to the outbound traffic froma particular network node, and in the latter, they correspond to theinbound traffic to a particular network node. The number ofsimultaneously monitored flows with the same IP address as thesource/destination address indicates the activity of a node with that IPaddress as the source/destination node in the observed time interval,respectively. The detection method 200 can be applied to any group ofaggregated numerical packet feature values. The feature grouping stepcan be performed by the flow aggregation module 103 (FIG. 1).

Alternatively, the features for all the flows monitored can be groupedtogether, in particular, by distinguishing the direction of flows,regardless of the particular source/destination IP addresses. This typeof grouping is interesting for a high level analysis which does not payattention to particular nodes or users, but rather to the networktraffic as a whole. Instead of the IP addresses, the features groupingcan be made according to the port numbers, which are indicative of theapplications of the packets transmitted.

With reference to the numerical feature selection, is also possible toextract and use information contained in other layers such as theapplication layer (layer 7), in addition to the numerical packetfeatures extracted from layers 3 and 4. In particular, the layer 7information can be used to aggregate the numerical features. Forexample, for SIP packets, the type of packet being transmitted or asource or destination SIP URI (Universal Resource Identifier) can beextracted and then used for aggregating the basic numerical packetfeatures related to layers 3 and 4. In some applications, certainstatistics from the payloads, e.g., related to the relative frequency ofpatterns can also be extracted and used as numerical packet features.

First Embodiment

A first embodiment 300 of the detection method 200 (i.e., “the movingvariance method”), is described herein below with reference to FIG. 3.This first embodiment is hereinafter also called “the sliding variancemethod”. Numerical feature x is one of the above identified numericalfeatures, expressed by a real, rational, or integer number, extractedfrom flows of network packets, at a chosen network node, in (elementary)short time intervals of length ΔT, in some time units (e.g., ΔT iscomprised in 1 s-5 min), where ΔT can vary in time. In particular, asindicated above, x can be the packet rate R_(packet), the average packetsize N_(size), or the byte rate R_(byte). For R_(packet) and R_(byte),the zero values are allowed, if there are no packets with chosenfeatures in the considered short time interval, whereas N_(size) has tobe positive and it is not defined if in the considered short timeinterval there are no packets with chosen features. The correspondingsequence of samples of the feature x taken in time is denoted as(x_(i))_(i=1) ^(∞)

As regards step 202 of FIG. 2, wherein the two packet portions PFP1 andPFP2 are defined, the method 200 is based on sliding windows, i.e., timeintervals that are sliding in time. Particularly, a time interval of alength T (greater than ΔT) that is sliding in time, is defined. The timeinterval of length T, starting from an initial position, each timeadvances τ units of time (delay), where τ is a given parameter. If ΔT isstatic, i.e., fixed, then T and τ can be defined as fixed integermultiples of ΔT. If ΔT is dynamic, i.e., variable in time, then it isassumed that each sliding window contains an integer number of shorttime intervals and approximately has the same length T.

Accordingly, two successive windows of (approximately) the same length Tare shifted τ units of time from each other and hence overlap over T−τunits of time. In this embodiment, at any given time, the packet flowportion PFP1 then corresponds to a sliding window at this time and thepacket flow portion PFP2 corresponds to the next sliding window, delayedby τ. It should be noted that samples of the numerical features x can betaken irregularly in time, i.e., in time intervals of possibly variablelength ΔT. In this case, the number of samples per sliding window mayvary in time, and so do the numbers of overlapping and non-overlappingsamples in two successive sliding windows. FIG. 6 schematicallyillustrates two time intervals corresponding to two exemplary slidingwindows: the first window W1 extends from time 0 to T and the secondwindow W2 extends from time τ to T+τ.

As shown by means of functional blocks in FIG. 3, in a storing step 303,a segment of numerical features samples

(x _(i))_(i=m) _(j) _(−n) _(j) ₊₁ ^(m) ^(j)   (1)

corresponding to a j^(th) sliding window of length T (e.g., window W1 ofFIG. 6), such as the one associated with the first packet flow portionPFP1 of step 202 in FIG. 2, is stored. The indexes in expression (1)represent the following:

-   -   i indicates a sample number,    -   j indicates a window number,    -   m_(j) indicates the end point of a window,    -   n_(j) indicates the number of samples in a window.        The number of samples n_(j) in the window/segment is in general        variable.

In a computing step 304, the following statistical quantities arecomputed:

$\begin{matrix}{S_{j} = {\sum\limits_{i = {m_{j} - n_{j} + 1}}^{m_{j}}x_{i}}} & (2) \\{\mu_{j} = \frac{S_{j}}{n_{j}}} & (3) \\{\Sigma_{j}^{2} = {\sum\limits_{i = {m_{j} - n_{j} + 1}}^{m_{j}}{\left( {x_{i} - \mu_{j}} \right)^{2}.}}} & (4)\end{matrix}$

Here, S^(j) is the summation of the samples, μ_(j) is the mean value,and Σ_(j) ² is the summation of the squared distances/deviations of thesamples from the mean value. The first variance σ_(j) ² is then computedaccording to the following expression:

$\begin{matrix}{\sigma_{j}^{2} = {\frac{\Sigma_{j}^{2}}{n_{j}}.}} & (5)\end{matrix}$

According to a further embodiment, for obtaining unbiased estimates ofvariance, it is possible to divide by n_(j)−1 instead of n_(j). Thefirst variance σ_(j) ² as a measure of dispersion of numerical samplesin the considered sliding window is thus associated with the firstpacket flow portion PFP1.

Analogously, a segment of numerical features samples

(x _(i))_(i=m) _(j+) _(−n) _(j+1) ^(m) ^(j+1)   (6)

corresponding to a (j+1)^(th) sliding window of length T (e.g., secondwindow W2 of FIG. 6), such as the one associated with the second packetflow portion PFP2 of step 202 in FIG. 2, is stored. In a computing step304, the mean value μ_(j+), and the second variance σ_(j+1) ² associatedwith the second sliding window of the second packet flow portion PFP2are then computed applying the expressions (2)-(5) to the segment (6).

In a computing step 305, the variation quantity Δ is computed from saidfirst σ_(j) ² and second σ_(j+1) ² variances. According to the exampledescribed, the variation quantity is a relative squared differenceδ_(j+1) of variances for the two successive segments (1) and (6) and canbe computed by the following formula:

$\begin{matrix}{\delta_{j + 1} = {\frac{\left( {\sigma_{j + 1}^{2} - \sigma_{j}^{2}} \right)^{2}}{\sigma_{j + 1}^{2}\sigma_{j}^{2}}.}} & (7)\end{matrix}$

According to another particular example, the relative squared differenceδ_(j+1) can be obtained by the following equivalent formula:

$\begin{matrix}{\delta_{j + 1} = {\frac{\left( {{n_{j}\Sigma_{j + 1}^{2}} - {n_{j + 1}\Sigma_{j}^{2}}} \right)^{2}}{n_{j}\Sigma_{j + 1}^{2}n_{j + 1}\Sigma_{j}^{2}}.}} & (8)\end{matrix}$

If formula (8) is applied, then the variances according to expression(5) are not explicitly computed. However, it should be noticed thatcomputation (7) in terms of the variances (i.e., mean squared deviationsor normalized Σ² values) is less sensitive to numerical errors than thecomputation (8) in terms of the Σ² values. It is observed that thequantity Σ², as a total squared deviation, is also a statisticaldispersion quantity.

In a comparison step 306, the relative squared difference δ_(j+1) isthen compared with a fixed or dynamic threshold θ_(j+1), where generallythe threshold increases as T/τ decreases. If the threshold is exceededonce or repeatedly a specified number of times in successivecomputations, where this number may increase as the ratio T/τ decreases,then an alarm ALARM for anomalous traffic is generated in an alarm step307.

FIG. 4 and FIG. 5 refer to a computer simulation. The curve of FIG. 4shows the sample data versus the sample indexes i, 0-800, whereas theresulting curve of relative squared differences δ of variances for thesliding variance method applied to data of FIG. 4 is shown in FIG. 5. Inthis example, the data consists of 801 samples, where the samplesnumbered 0-400 are produced as independent samples according to theGaussian distribution with the mean value 20000 and the variance 2000,whereas the samples numbered 401-800 are produced as independent samplesaccording to the Gaussian distribution with the same mean value 20000and the increased variance 64000. In FIG. 5, the window duration used isT=20 data samples and the time resolution is τ=5 data samples. It isobserved that an alert is issued at the rising edge of the curve of FIG.5 and such rising edge is particularly sharp-cut.

With reference to the threshold definition and according to an example,the threshold θ may be a fixed value as it relates to the relativeinstead of absolute change of variances, and not to the change of meanvalues, which is expected to be considerable even for normal traffic.More precisely, if the samples are drawn independently according to thesame probability distribution, then, even if two successive segments arenot overlapping (i.e., if τ=T), the relative squared difference ofvariances δ is bounded most of the time by a small value inverselyproportional to the number of samples in a segment, independently of thevariance of the samples.

In accordance with another example, to account for changes of variancein normal traffic, the threshold θ could be determined possibly fromhistorical data for normal traffic, at a considered network node, inorder to keep the false positive rate reasonably low. In particular, itmay depend on the time of the day. Particularly, the threshold can bechosen irrespectively of statistical model estimating the trafficbehavior.

Given an appropriate value of the threshold θ, it is then expected thatthe probability that the threshold is exceeded, i.e., the false-positiverate is low for normal traffic, whereas at times where there is a changefrom normal traffic to anomalous traffic, it is expected that thethreshold is not exceeded with a low probability, i.e., with a lowfalse-negative rate. It is noticed that the method is robust as thechanges are related to variances and not to the mean values. Moreover,it is robust since the method 300 exploits the relative, not absolutechanges in variance. However, in some applications, absolute changes invariance can be used, but then the thresholds may be variable.

It should be noticed that the value of the delay or shift τ determinesthe resolution of the above proposed statistical anomaly detectionmethod 300, because it takes τ units of time, or a small multiple of τunits of time, in order to detect a change from normal to anomaloustraffic. Preferably, the value of T should be large enough in order toobtain relatively stable estimates of variance so that for normaltraffic the relative changes of variance are not too large. On the otherhand, the ratio T/τ should not be too large so that the change oftraffic from normal to anomalous does not require a very small thresholdθ to be detected. For example, the ratio T/τ may be chosen so as tosatisfy the following expression:

1≦T/τ≦10  (9)

In particular, it is suggested that using the average packet sizeN_(size) as the packet feature x is especially useful for detectinganomalous traffic, e.g., due to (D)DoS attacks, port scanning attacks,SPAM and SPIT attacks, as well as malicious software attacks. In thiscase, in addition to detecting a considerable relative change invariance, a decrease of the variance can be detected as well.

Second Embodiment

According to a second example of the detection method 200, the twosuccessive windows are defined in a different way with respect to thefirst embodiment.

According to this second example, in step 202 at time j+1, the followingfirst and second sample segments corresponding to packet flow portionsPFP1 and PFP2, respectively, are considered:

(x _(i))_(i=m) _(j+1) _(−n) _(j+1) ₁ ^(m) ^(j)   (10)

(x _(i))_(i=m) _(j+1) _(−n) _(j+1) ₁ ^(m) ^(j)   (11)

where the first segment (10) is the initial part of the second segment,without the ending part (x_(i))_(i=m) _(j) ₊₁ ^(m) ^(j+1) or,equivalently, the last part of the preceding segment (x_(i))_(i=m) _(j)_(−n) _(j) ₊₁ ^(m) ^(j) , without the initial part (x_(i))_(i=m) _(j)_(−n) _(j) ₊₁ ^(m) ^(j+1) ^(−n) ^(j+1)

FIG. 7 shows schematically two successive sliding windows W1 and W2, asin FIG. 6, with the difference that the first packet flow portion PFP1is now determined by a shortened window W1′, extending from τ to T,whereas the second packet flow portion PFP2 is determined by W2 as inFIG. 6. In this way, the past data leaving the current sliding windoware thus excluded from the variance comparison.

Moreover, in step 204 the first variance {circumflex over (σ)}_(j) ²measuring the dispersion of the first, shortened sample segment (10) iscomputed. In accordance with this second example, the followingexpressions can be employed:

$\begin{matrix}{{\hat{S}}_{j} = {\sum\limits_{i = {m_{j + 1} - n_{j + 1} + 1}}^{m_{j}}x_{i}}} & (12) \\{{\hat{\mu}}_{j} = \frac{{\hat{S}}_{j}}{n_{j + 1} - m_{j + 1} + m_{j}}} & (13) \\{{\hat{\Sigma}}_{j}^{2} = {\sum\limits_{i = {m_{j + 1} - n_{j + 1} + 1}}^{m_{j}}\left( {x_{i} - {\hat{\mu}}_{j}} \right)^{2}}} & (14) \\{{\hat{\sigma}}_{j}^{2} = {\frac{{\hat{\Sigma}}_{j}^{2}}{n_{j + 1} - m_{j + 1} + m_{j}}.}} & (15)\end{matrix}$

Moreover, also in step 204, the second variance τ_(j+1) ² measuring thedispersion of the second samples segment (11) is then computed in thesame way as in the first embodiment, i.e., by using expressions (2)-(5).Particularly, the second variance σ_(j+1) ² is given by the followingformula:

$\begin{matrix}{\sigma_{j + 1}^{2} = {\frac{\Sigma_{j + 1}^{2}}{n_{j + 1}}.}} & (16)\end{matrix}$

In step 205, the relative squared difference of variances of the twosegments is then computed as:

$\begin{matrix}{\delta_{j + 1} = {\frac{\left( {\sigma_{j + 1}^{2} - {\hat{\sigma}}_{j}^{2}} \right)^{2}}{\sigma_{j + 1}^{2}{\hat{\sigma}}_{j}^{2}} = {\frac{\left( {{\left( {n_{j + 1} - m_{j + 1} + m_{j}} \right)\Sigma_{j + 1}^{2}} - {n_{j + 1}{\hat{\Sigma}}_{j}^{2}}} \right)^{2}}{\left( {n_{j + 1} - m_{j + 1} + m_{j}} \right)\Sigma_{j + 1}^{2}n_{j + 1}{\hat{\Sigma}}_{j}^{2}}.}}} & (17)\end{matrix}$

As is clear from expression (17), similarly as in the first embodiment,the relative squared difference of variances δ_(j+1) can be obtainedwith or without the explicit computation of the variances, i.e., byusing the normalized Σ² values or the Σ² values without normalization.As indicated in step 206, the relative squared difference of variancesδ_(j+1) is then compared with a threshold, where the threshold may besomewhat reduced in comparison with the one of the sliding variancemethod 300.

Namely, for normal traffic, the variances for the two segments underconsideration (such as the windows W2 and W l′ in FIG. 7) are thenexpected to be less mutually different than in the sliding variancemethod 300 (e.g., using the windows W2 and W1 in FIG. 6).

It should be observed that this second embodiment of the method 200 maybe more suitable than the first embodiment 300 for detecting anomaloustraffic of duration shorter than the window size T. This is due to thefact that in the first embodiment, a considerable change in variance,due to the beginning of anomalous traffic, would be detected not only bythe ending point of the sliding-window (such as the window W2 in FIG.6), when it enters the segment of anomalous traffic, but also by theinitial part of the sliding window (such as the window W2 in FIG. 6),when it leaves the segment of normal traffic and enters the segment ofanomalous traffic. On the other hand, in the second embodiment, a changeof variance will then be detected only by the ending point of thesliding window (such as the window W2 in FIG. 7).

Third Embodiment

In a third embodiment of the detection method 200, a moving window ofincreasing length is defined. Such moving window extends from a choseninitial time up to the current time, and each time, the ending point ofthe moving window advances τ units of time, where r determines theresolution in time for detecting the anomalous changes in traffic. FIG.8 shows schematically three exemplary successive windows W_(j), W_(j+i),and W_(j+2) drawn in accordance with this third embodiment of thedetection method 200. In FIG. 8, t denotes a generic time.

At each time, the packet flow portions PFP1 and PFP2 correspond to twosuccessive moving windows. Accordingly, for a generic window index j,the packet flow portion PFP1 is defined by the segment

(x _(i))_(i=1) ^(m) ^(j,)   (18)

which is associated with the j^(th) moving window containing m_(j)samples, and the packet flow portion PFP2 is defined by the segment

(x _(i))_(i=1) ^(m) ^(j+1,)   (19)

which is associated with the (j+1)^(th) moving window containing m_(j+1)samples.

In step 204, at each time, i.e., for each moving window, the variance iscomputed by a weighted average technique, e.g., by a standard EWMA(Exponentially Weighted Moving Average) technique.

According to a standard EWMA technique, an iterative-recursivecomputation of the mean value and the variance of the data for every newdata sample is performed. The quantities μ_(k) and σ_(k) ² are theestimated mean value and variance, respectively, of a genericsub-segment (x_(i))_(i=1) ^(k) k initial data samples. The two segmentsdefined by the expressions (18) and (19) given above then correspond tothe values k=m_(j) and k=m_(j+1), respectively. Given two constants αand β, satisfying 0≦α,β≦1, μ_(k) and σ_(k) ² are then iterativelycomputed, for k=1,2, . . . , as:

μ_(k+1) =βx _(k+1)+(1−β)μ_(k)  (20)

σ_(k+1) ²=α(x_(k+1)−μ_(k+1))²+(1−α)σ_(k) ²  (21)

with the initial values μ₁=x₁ and σ₁ ²=0. In a particular case, we mayhave α=β.

The explicit solution for the mean value is given by:

$\begin{matrix}{{\mu_{k} = {{\left( {1 - \beta} \right)^{k - 1}x_{1}} + {\beta {\sum\limits_{i = 2}^{k}{\left( {1 - \beta} \right)^{k - i}x_{i}}}}}},} & (22)\end{matrix}$

and the explicit solution for the variance is given by:

$\begin{matrix}{\sigma_{k}^{2} = {\alpha {\sum\limits_{i = 2}^{k}{\left( {1 - \alpha} \right)^{k - i}{\left( {x_{i} - \mu_{i}} \right)^{2}.}}}}} & (23)\end{matrix}$

According to this embodiment, the variance at time k measures theexponentially weighted average deviation of the initial k data samplesfrom the corresponding mean values at the same times. Using the aboveidentified formulas (20)-(23) the variances of the first and secondsegments (18) and (19) are computed by setting k=m_(j) and k=m_(j+1),respectively.

In the step 205, the relative squared difference of variances for twosegments (x_(i=1) ^(m) ^(j) , and x_(i=1) ^(m) ^(j+1) corresponding totwo successive moving windows, to be compared with a static or dynamicthreshold, is then computed as:

$\begin{matrix}{\delta_{j + 1} = {\frac{\left( {\sigma_{m_{j + 1}}^{2} - \sigma_{m_{j}}^{2}} \right)^{2}}{\sigma_{m_{j + 1}}^{2}\sigma_{m_{j}}^{2}}.}} & (24)\end{matrix}$

Fourth Embodiment

In accordance with a fourth embodiment which is alternative to the abovedescribed third embodiment, the mean value μ_(k) and variance σ_(k) ²are the estimated by a new proposed EWMA technique. According to thistechnique adapted to the moving variance method, μ_(k) and σ_(k) ² areiteratively computed, computed, for k=1,2, . . . , as:

μ_(k+1) =βx _(k+1)+(1−β)μ_(k)  (25)

ξ_(k+1) ²=α(x _(k+1))²+(1−α)ξ_(k) ²  (26)

σ_(k+1) ²=ξ_(k+1) ²−(μ_(k+1))²  (27)

with the initial values μ₁=x₁ and ξ₁ ²=(x₁)², where ξ_(k) ² denotes theestimated second moment of the data on the segment (x_(i))_(i=1) ^(k).In a particular case, α=β.

The explicit solution for the mean value is still given by:

$\begin{matrix}{{\mu_{k} = {{\left( {1 - \beta} \right)^{k - 1}x_{1}} + {\beta {\sum\limits_{i = 2}^{k}{\left( {1 - \beta} \right)^{k - i}x_{i}}}}}},} & (28)\end{matrix}$

whereas, the explicit solution for the second moment is given by:

$\begin{matrix}{\xi_{k}^{2} = {{\left( {1 - \alpha} \right)^{k - 1}\left( x_{1} \right)^{2}} + {\alpha {\sum\limits_{i = 2}^{k}{\left( {1 - \alpha} \right)^{k - i}{\left( x_{i} \right)^{2}.}}}}}} & (29)\end{matrix}$

In the case when α=β, the explicit solution for the variance is givenby:

$\begin{matrix}{\sigma_{k}^{2} = {\alpha {\sum\limits_{i = 2}^{k}{\left( {1 - \alpha} \right)^{k - i}{\left( {x_{i} - \mu_{k}} \right)^{2}.}}}}} & (30)\end{matrix}$

Therefore, according to this fourth embodiment, the variance at time kmeasures the exponentially weighted average deviation of the initial kdata samples from the overall mean value at time k, which corresponds tothe whole moving window at time k. The fourth embodiment is hence moresensitive than the third embodiment with respect to detecting anomalouschanges in traffic data.

For both third and fourth embodiments, the values of the constants α andβ determine the effective number of past samples influencing thevariance and mean value estimates, respectively. More precisely, thesenumbers increase as the constants α and β decrease. In particular, thevalues of the constants α and β close to one, which are typically usedin standard applications of the EWMA technique (e.g., for detecting thedata outliers or for reducing the noise in given data) do not appear tobe suitable for the described detection method, where smaller values ofthe constants α and β are preferred.

In any case, it is preferable to choose the constants α and β inaccordance with the statistical properties of the normal traffic. Ingeneral, the faster the variance variations in normal traffic areexpected, the bigger the constants should be chosen.

Also for the fourth embodiment described with reference formulas(25)-(30), the relative squared difference of variances for two segments(x_(i))_(i=1) ^(m) ^(j) and (x_(i))_(i=1) ^(m) ^(j+1) corresponding totwo successive moving windows, to be compared with a static or dynamicthreshold, is then computed by the above mentioned formula (24).

The Applicant observes that the above description of the methodsapplying the EWMA techniques enables efficient computation, by using thecorresponding recursions. These recursions allow a considerable datamemory reduction in comparison with the first two embodiments.

The Applicant has performed computer simulations relating to the abovedescribed third and fourth embodiments of the detection method 200. Forthe same data shown in FIG. 4, the resulting curves of relative squareddifferences of variances for the third embodiment (formulas (20)-(24))and the fourth embodiment (formulas (25)-(30) and (24)) are depicted inFIG. 9 and FIG. 10, respectively. For both third and fourth embodiments,the constants used are α=β=0.07 and the time resolution is τ=5 datasamples. It should be noticed that an alert is issued at the rising edgeof the respective curves and such rising edges are particularlysharp-cut.

Fifth Embodiment

In a fifth embodiment 400 of the present invention schematically shownin FIG. 11, a number of numerical packet features are employedsimultaneously. In accordance with this fifth embodiment 400 and underthe assumption that the numerical packet features are roughlyindependent, a plurality of N variances associated with differentnumerical packet features are computed. Particularly, steps 201-205 ofthe exemplary method 200 are repeated for each different numericalpacket feature, for j=1, . . . , N.

Moreover, for each considered numerical packet feature, a relativesquared difference of variances δ_(j) is computed, for j=1, . . . , N.In a further step 401 (Σ), the relative squared differences of variancesδ₁, . . . , δ_(N) are combined to obtain a total variation quantityΔ_(tot). According to an example, the combination step is a summation ofsaid relative squared differences of variances δ₁ . . . , δ_(N). Thetotal variation quantity Δ_(tot) is then compared (step 402) with athreshold value Thr in order to detect an anomalous condition which cancause the generation of an alarm signal or message ALARM (branch Yes).The comparison of the total variation quantity Δ_(tot) with thethreshold value Thr could detect a normal traffic condition (branch No)and, in this case, no alarm signals or messages are generated.

It should be noticed that the combination step 401 may be performed bydifferent types of combination algorithms. According to anotherparticular example, prior to performing the summation, the relativesquared differences of variances δ₁, . . . , δ_(N) are multiplied withdifferent weights that may be associated with individual numericalpacket features.

Moreover, different decision criteria may be employed. According to anexample, a total variation quantity Δ_(tot) is not computed andcomparisons of each of the relative squared difference of variances δ₁,. . . , δ_(N) with a respective threshold is performed. An alarm signalis then generated if at least a specified number of said comparisonsdetect an anomalous traffic condition. According to another example, inaddition to the variation quantity criterion, aiming at detecting suddenchanges of variance, one may also take into account other criteria,e.g., for message flooding (D)DoS attacks, one may require that there isalso a significant change of the packet rate R_(packet) or the byte rateR_(byte).

According to yet another example, the result of the comparison of thevariation quantity (such as the relative squared differences ofvariances) with a threshold is logically combined with changes in otherstatistical quantities, e.g., the mean values of the numerical packetfeatures chosen.

The Applicant notices that the use of the relative squared differences δexpressed by formulas (7), (8), (17), and (24) are particularlyadvantageous as it has the capacity for detecting an increase as well asa decrease of variance, as can be readily seen from the followingequivalent expression:

$\begin{matrix}{\delta_{j + 1} = {\frac{\left( {\sigma_{j + 1}^{2} - \sigma_{j}^{2}} \right)^{2}}{\sigma_{j + 1}^{2}\sigma_{j}^{2}} = {\frac{\sigma_{j + 1}^{2}}{\sigma_{j}^{2}} + \frac{\sigma_{j}^{2}}{\sigma_{j + 1}^{2}} - 2.}}} & (31)\end{matrix}$

According to another example of method 200, in all the embodimentsdescribed, the variation quantity 6, (step 205 in FIG. 2) can becomputed as a relative difference adapted for detecting either anincrease of variance (i.e., σ_(j+1) ²>σ_(j) ²), by

$\begin{matrix}{\delta_{j + 1} = \left( {\frac{\sigma_{j + 1}^{2}}{\sigma_{j}^{2}} - 1} \right)^{2}} & (32)\end{matrix}$

or a decrease of variance (i.e., σ_(j+1) ²>σ_(j) ²), by

$\begin{matrix}{\delta_{j + 1} = {\left( {\frac{\sigma_{j}^{2}}{\sigma_{j + 1}^{2}} - 1} \right)^{2}.}} & (33)\end{matrix}$

Sixth Embodiment

According to a sixth embodiment of method 200, applicable to slidingwindows (FIG. 6) described with reference to the first embodiment 300 ormoving windows (FIG. 7) described with reference to the secondembodiment, a statistical dispersion quantity is computed not only as avariance, but in a modified way, which will be illustrated furtherbelow.

More precisely, for a sliding window segment (x_(i))_(i=m) _(j) _(−n)_(j) ₊₁ ^(m) ^(j) given by the expression (1), instead of computing onlythe variance by using the expressions (2)-(5) as in the firstembodiment, a first statistical dispersion quantity ε_(j) ² is computedby the following formula:

$\begin{matrix}{ɛ_{j}^{2} = {\sigma_{j}^{2} - {\frac{\left( {\left( {\frac{1}{n_{j}}{\sum\limits_{i = {m_{j} - n_{j} + 1}}^{m_{j}}{\left( {i - m_{j} + n_{j}} \right)x_{i}}}} \right) - {\mu_{j}\frac{n_{j} + 1}{2}}} \right)^{2}}{\left( {n_{j}^{2} - 1} \right)/2}.}}} & (34)\end{matrix}$

For a shortened sliding window segment given by the expression (10),instead of computing only the variance by using the expressions(12)-(15) as in the second embodiment, a first statistical dispersionquantity {circumflex over (ε)}_(j) ² is computed by the formulaanalogous to (34).

A second statistical dispersion quantity {circumflex over (ε)}_(j+1),associated with the sliding window segment given by the expression (6),is computed by the formula (34) by setting the index j to j+1. The firstand the second statistical dispersion quantities ε_(j) ² and ε_(j+1) ²are then used to compute the variation quantity δ_(j+1), as an example,according to the following formula:

$\begin{matrix}{\delta_{j + 1} = {\frac{\left( {ɛ_{j + 1}^{2} - ɛ_{j}^{2}} \right)^{2}}{ɛ_{j + 1}^{2}ɛ_{j}^{2}}.}} & (35)\end{matrix}$

The first and the second statistical dispersion quantities {circumflexover (ε)}_(j) ² and ε_(j+1) are used to compute the variation quantityδ_(j+1) in an analogous way.

This sixth embodiment is particularly suitable to detect anomalies innormal traffic data that is non-stationary or correlated on the slidingwindows considered, e.g., when the sliding window duration T isrelatively long and, in particular, when the elementary time interval ΔTis relatively long (e.g., ΔT=5 min).

Namely, the variance σ_(j) ², on a j^(th) sliding window, canequivalently be regarded as the minimum mean squared deviation of datafrom any constant approximation of data, where this minimum is achievedif (and only if) the constant, used for the approximation, is equal tothe mean value of data.

In a more general setting, the statistical dispersion quantity ε_(j) ²,defined as above, is equal to the minimum mean squared deviation of datafrom any affine approximation of data. It is noted that an affineapproximation is an affine function of data, which itself is the sum ofa linear function and a constant. The optimal affine approximationminimizing this mean squared deviation can be determined by the linearregression techniques and is not shown here, because all what matters isthe resulting minimum mean squared deviation, which is given by theformula (34), for a segment of data samples (x_(i=m) _(j) _(−n) _(j) ₊₁^(m) ^(j) corresponding to a generic j^(th) sliding window.

The formula (34) is valid under the assumption of regular data sampling,i.e., that the elementary time intervals have a fixed duration ΔT. Theexpression is readily generalized to deal with irregular data sampling,by substituting the irregular normalized timings of the data samples inthe considered sliding window for the regular normalized timingsi−m_(j)+n_(j), i=m_(j)−n_(j)+1, . . . , m_(j), and by substituting thecorresponding mean value of irregular normalized timings for the meanvalue of regular normalized timings (n_(j)+1)/2.

The generalized mean squared deviation is used in the same way as thevariance. If the data is stationary and uncorrelated, then ε_(j) ²≈δ_(j)². However, if the normal traffic data is non-stationary or correlated,then can be considerably smaller than σ_(j) ² and as such is more robustto be used as a measure of variation of data, because it eliminateslinear trends in the normal traffic data, due to its non-stationarity orcorrelation. On the other hand, ε_(j) ² is less sensitive to anomalouschanges of traffic data.

Seventh Embodiment

A seventh embodiment refers to the same definition of sliding windows asdescribed in relation to the first embodiment 300 (FIG. 6), but employscomputation algorithms for the variance a and the relative squareddifference of variances 8 different from the ones of the firstembodiment 300. It is observed that according to the first embodiment300 the variance a is repeatedly computed for overlapping segments thatmay have a lot of data samples in common.

According to the seventh embodiment, instead of re-computing thevariance for each new sliding segment by using the expressions (2)-(5)and the relative squared difference of variances by using the expression(7) or (8), the variance already computed for the preceding segment aswell as the corresponding relative squared difference of variances arebeing updated. This approach allows one to save in computations. Withreference to the data memory requirements, also for the seventhembodiment all the data samples belonging to a preceding sliding windowfor which the variance was previously computed need to be stored.

As described with reference to the first embodiment 300 (FIG. 6), twosuccessive sliding segments (x_(i))_(i=m) _(j) _(−n) _(j) ₊₁ ^(m) ^(j)and (x_(i))_(i=m) _(j+1) _(−n) _(j+1) ₁ can be expressed by (1) and (6),respectively. Then the number of samples from the second segment (6) notincluded in the first segment (1) is m_(j+1)−m_(j), whereas the numberof samples from the first segment (1) not included in the second segment(6) is m_(j+1)−m_(j)−n_(j+1)+n_(j). As these two numbers are differentif n_(j+), ≠n_(j), it is suitable to define their maximum Δn_(j) by:

Δn _(j)=max(m _(j+1) −m _(j) ,m _(j+1) −m _(j) −n _(j+1) +n _(j)).  36

Also, the following auxiliary value is defined as:

$\begin{matrix}{\mu_{j}^{\prime} = {\frac{S_{j}}{n_{j + 1}}.}} & (37)\end{matrix}$

The seventh embodiment includes an initial step in which, with referenceto a sliding window with index j=1, the following quantities arecomputed: S₁, μ_(1, Σ) ₁ ², and σ₁ ². Then the mean values andvariances, are iteratively computed for j=1,2,3, . . . , by using thefollowing update expressions:

$\begin{matrix}{S_{j + 1} = {S_{j} + {\sum\limits_{i = 0}^{{\Delta \; n_{j}} - 1}\left( {x_{m_{j + 1} - i}^{\prime} - x_{m_{j + 1} - n_{j + 1} - i}^{\prime}} \right)}}} & (38) \\{\mu_{j + 1} = \frac{S_{j + 1}}{n_{j + 1}}} & (39) \\{\mu_{j}^{\prime} = \frac{S_{j}}{n_{j + 1}}} & (40) \\\begin{matrix}{\Sigma_{j + 1}^{2} = {\Sigma_{j}^{2} + {\sum\limits_{i = 0}^{{\Delta \; n_{j}} - 1}{\begin{pmatrix}{x_{m_{j + 1} - i}^{\prime} -} \\x_{m_{j + 1} - n_{j + 1} - i}^{\prime}\end{pmatrix}\begin{pmatrix}{x_{m_{j + 1} - i}^{\prime} - \mu_{j + 1} +} \\{x_{m_{j + 1} - n_{j + 1} - i}^{\prime} - \mu_{j}^{\prime}}\end{pmatrix}}} -}} \\{{{\mu_{j}{\mu_{j}^{\prime}\left( {n_{j + 1} - n_{j}} \right)}},}}\end{matrix} & (41) \\{where} & \; \\{x_{m_{j + 1} - i}^{\prime} = \left\{ \begin{matrix}{x_{m_{j + 1} - i},{i \leq {m_{j + 1} - m_{j} - 1}}} \\{0,{i \geq {m_{j + 1} - m_{j}}}}\end{matrix} \right.} & (42) \\{x_{m_{j + 1} - n_{j + 1} - i}^{\prime} = \left\{ \begin{matrix}{x_{m_{j + 1} - n_{j + i} - i},{i \leq {m_{j + 1} - m_{j} - n_{j + 1} + n_{j} - 1}}} \\{0,{i \geq {m_{j + 1} - m_{j} - n_{j + 1} + {n_{j}.}}}}\end{matrix} \right.} & (43)\end{matrix}$

It is noted that (42) and (43) can equivalently be defined as follows:x′_(m) _(j+1) _(−i) is equal to x_(m) _(m+1) _(−m) if x′_(m) ₊₁ _(−i)belongs to the last part of the second segment not contained in thefirst segment and x′_(m) _(j+1) _(−n) _(j+1) _(−i) is equal to if x_(m)_(j+1) _(−n) _(j+1) _(−i) belongs to the initial part of the firstsegment not contained in the second segment and to zero otherwise,respectively. In other words, the shorter of the two non-overlappingparts of the two segments, which is not contained in the other segment,is filled in with zeros.

The relative squared difference of variances can then be computed as:

$\begin{matrix}{\delta_{j + 1} = {\frac{\left( {\sigma_{j + 1}^{2} - \sigma_{j}^{2}} \right)^{2}}{\sigma_{j + 1}^{2}\sigma_{j}^{2}} = {\frac{\left( {{n_{j}\Sigma_{j + 1}^{2}} - {n_{j + 1}\Sigma_{j}^{2}}} \right)^{2}}{n_{j}\Sigma_{j + 1}^{2}n_{j + 1}\Sigma_{j}^{2}} = {\frac{1}{\Sigma_{j + 1}^{2}\Sigma_{j}^{2}}{\begin{pmatrix}{{\sqrt{\frac{n_{j}}{n_{j + 1}}}{\sum\limits_{i = 0}^{{\Delta \; n_{j}} - 1}{\begin{pmatrix}{x_{m_{j + 1} - i}^{\prime} -} \\x_{m_{j + 1} - n_{j + 1} - i}^{\prime}\end{pmatrix}\begin{pmatrix}{x_{m_{j + 1} - i}^{\prime} - \mu_{j + 1} +} \\{x_{m_{j + 1} - n_{j + 1} - i}^{\prime} - \mu_{j}^{\prime}}\end{pmatrix}}}} -} \\{\frac{n_{j + 1} - n_{j}}{\sqrt{n_{j}n_{j + 1}}}\left( {\Sigma_{j}^{2} + {S_{j}\mu_{j}^{\prime}}} \right)}\end{pmatrix}^{2}.}}}}} & (44)\end{matrix}$

In the case when the numbers of samples in two successive segments areequal, i.e., n_(j+1)=n_(j), nΔ_(j) is given as:

Δn _(j) =m _(j+1) −m _(j)  (45)

and the update expressions and the expression for the relative squareddifference then simplify into:

$\begin{matrix}{\mu_{j + 1} = {\mu_{j} + \frac{\sum\limits_{i = 0}^{{\Delta \; n_{j}} - 1}\left( {x_{m_{j + 1} - i} - x_{m_{j + 1} - n_{j + 1} - i}} \right)}{n_{j}}}} & (46) \\{\Sigma_{j + 1}^{2} = {\Sigma_{j}^{2} + {\sum\limits_{i = 0}^{{\Delta \; n_{j}} - 1}\begin{pmatrix}{x_{m_{j + 1} - i} -} \\x_{m_{j + 1} - n_{j + 1} - i}\end{pmatrix}}}} & (47) \\\begin{matrix}{\delta_{j + 1} = \frac{\left( {\sigma_{j + 1}^{2} - \sigma_{j}^{2}} \right)^{2}}{\sigma_{j + 1}^{2}\sigma_{j}^{2}}} \\{= \frac{\left( {\Sigma_{j + 1}^{2} - \Sigma_{j}^{2}} \right)^{2}}{\Sigma_{j + 1}^{2}\Sigma_{j}^{2}}} \\{= {\frac{1}{\Sigma_{j + 1}^{2}\Sigma_{j}^{2}}{\left( {\sum\limits_{i = 0}^{{\Delta \; n_{j}} - 1}{\begin{pmatrix}{x_{m_{j + 1} - i} -} \\x_{m_{j + 1} - n_{j + 1} - i}\end{pmatrix}\begin{pmatrix}{x_{m_{j + 1} - i} - \mu_{j + 1} +} \\{x_{m_{j + 1} - n_{j + 1} - i} - \mu_{j}}\end{pmatrix}}} \right)^{2}.}}}\end{matrix} & (48)\end{matrix}$

Eighth Embodiment

An eighth embodiment refers to sliding windows of the type described forthe second embodiment (FIG. 7), but employs computation algorithms forthe variance a and the relative squared difference of variances gdifferent from the ones of the second embodiment. For the secondembodiment described above, the variances are computed and compared fortwo overlapping segments (x_(i))_(i=m) _(j+1) _(−n) _(j+1) ₊₁ ^(m) ^(j)and (x_(i))_(i=m) _(j+1) _(−n) _(j+1) ₊₁ ^(m) ^(j+1) given by theexpressions (10) and (11), respectively, for each j=1,2,3, . . . .According to the eighth embodiment, a basic way of performing moreefficient computations is to compute the variances separately for thesliding segments (11), for j=1,2,3, . . . , and for the shortenedsliding segments (10), for j=1,2,3, . . . , by using the updateexpressions given above according to the seventh embodiment, which areadapted to deal with the shortened sliding windows. This roughly doublesthe computation time in comparison with the seventh embodiment.

According to the eighth embodiment, another, even more efficient way ofperforming the computations is to compute the variances for theshortened sliding segments (10), for j=1,2,3, . . . , by using adaptedupdate expressions from the seventh embodiment. Then, for each j=1,2,3,. . . , the variance of segment (11) is computed from the variance ofsegment (10), together with the corresponding relative squareddifference to of the two variances, either by using the adapted updateexpressions or, alternatively, by using the following simplified andnumerically more convenient expressions, in which Δn_(j)=m_(j+1)−m_(j)and {circumflex over (n)}_(j+)=n_(j+1)−Δn_(j):

$\begin{matrix}{\mu_{j + 1} = {{\hat{\mu}}_{j} + \frac{\sum\limits_{i = 0}^{{\Delta \; n_{j}} - 1}\left( {x_{m_{j + 1}} - {\hat{\mu}}_{j}} \right)}{n_{j + 1}}}} & (49) \\{\Sigma_{j + 1}^{2} = {{\hat{\Sigma}}_{j}^{2} + {\sum\limits_{i = 0}^{{\Delta \; n_{j}} - 1}{\left( {x_{m_{j + 1} - i} - {\hat{\mu}}_{j}} \right)\left( {x_{m_{j + 1} - i} - \mu_{j + 1}} \right)}}}} & (50) \\\begin{matrix}{\delta_{j + 1} = \frac{\left( {\sigma_{j + 1}^{2} - {\hat{\sigma}}_{j}^{2}} \right)^{2}}{\sigma_{j + 1}^{2}{\hat{\sigma}}_{j}^{2}}} \\{= \frac{\left( {{{\hat{n}}_{j}\Sigma_{j + 1}^{2}} - {n_{j + 1}{\hat{\Sigma}}_{j}^{2}}} \right)^{2}}{{\hat{n}}_{j}\Sigma_{j + 1}^{2}n_{j + 1}{\hat{\Sigma}}_{j}^{2}}} \\{= {\frac{1}{\Sigma_{j + 1}^{2}{\hat{\Sigma}}_{j}^{2}}{\begin{pmatrix}{{\sqrt{\frac{{\hat{n}}_{j}}{n_{j + 1}}}{\sum\limits_{i = 0}^{{\Delta \; n_{j}} - 1}{\begin{pmatrix}{x_{m_{j + 1} - i} -} \\{\hat{\mu}}_{j}\end{pmatrix}\begin{pmatrix}{x_{m_{j + 1} - i} -} \\\mu_{j + 1}\end{pmatrix}}}} -} \\{\frac{\Delta \; n_{j}}{\sqrt{{\hat{n}}_{j}n_{j + 1}}}{\hat{\Sigma}}_{j}^{2}}\end{pmatrix}^{2}.}}}\end{matrix} & (51)\end{matrix}$

Ninth Embodiment

In accordance with a ninth embodiment, employing the minimum meansquared deviation ε_(j) ² described with reference to the sixthembodiment, the corresponding expression for the mean squared deviationε_(j) ² is updated by updating the variance σ_(j) ² and the mean valueμ_(j) in the same way as defined above for the seventh or the eighthembodiment, as well as by updating the arithmetic mean:

$\begin{matrix}{\frac{1}{n_{j}}{\sum\limits_{i = {m_{j} - n_{j} + 1}}^{m_{j}}{\left( {i - m_{j} + n_{j}} \right){x_{i}.}}}} & (52)\end{matrix}$

More precisely, for the windows according to the first embodiment 300described above, the arithmetic mean associated with (x_(i))_(i=m)_(j+1) _(−n) _(j+1) ₊₁ ^(m) ^(j+1) can be computed from the arithmeticmean associated with (x_(i))_(i=m) _(j) _(−n) _(j+1) ₊₁ ^(m) ^(j) byusing:

$\begin{matrix}{{\frac{1}{n_{j + 1}}{\sum\limits_{i = {m_{j + 1} - n_{j + 1} + 1}}^{m_{j + 1}}{\left( {i - m_{j + 1} + n_{j + 1}} \right)x_{i}}}} = {{\frac{n_{j}}{n_{j + 1}}\left( {\frac{1}{n_{j}}{\sum\limits_{i = {m_{j} - n_{j} + 1}}^{m_{j}}{\left( {i - m_{j} + n_{j}} \right)x_{i}}}} \right)} + {\frac{1}{n_{j + 1}}\left( {{\sum\limits_{i = {m_{j} + 1}}^{m_{j + 1}}{\left( {i - m_{j} + n_{j}} \right)x_{i}}} - {\sum\limits_{i = {m_{j} - n_{j} + 1}}^{m_{j + 1} - n_{j + 1}}{\left( {i - m_{j} + n_{j}} \right)x_{i}}}} \right)} - {\left( {m_{j + 1} - m_{j} - n_{j + 1} + n_{j}} \right){\mu_{j + 1}.}}}} & (53)\end{matrix}$

Similarly, for the windows according to the second embodiment describedabove, the arithmetic mean associated with (x_(i))_(i=m) _(j+1) _(−n)_(j+1) ₊₁ ^(m) ^(j+1) can be computed from the arithmetic meanassociated with (x_(i))_(i=m) _(j+1) _(−n) _(j+1) ₊₁ ^(m) ^(j) by using:

$\begin{matrix}{{\frac{1}{n_{j + 1}}{\sum\limits_{i = {m_{j + 1} - n_{j + 1} + 1}}^{m_{j + 1}}{\left( {i - m_{j + 1} + n_{j + 1}} \right)x_{i}}}} = {{\frac{{\hat{n}}_{j}}{n_{j + 1}}\left( {\frac{1}{{\hat{n}}_{j}}{\sum\limits_{i = {m_{j + 1} - n_{j + 1} + 1}}^{m_{j}}{\left( {i - m_{j + 1} + n_{j + 1}} \right)x_{i}}}} \right)} + {\frac{1}{n_{j + 1}}{\left( {\sum\limits_{i = {m_{j} + 1}}^{m_{j + 1}}{\left( {i - m_{j + 1} + n_{j + 1}} \right)x_{i}}} \right).}}}} & (54)\end{matrix}$

The findings and teachings of the present invention show manyadvantages. Theoretical considerations and the simulations made (e.g.,FIG. 4, FIG. 5, FIG. 9, and FIG. 10) indicate that the methods describedare reliable for detecting anomalies in traffic data due to a change ofvariance, regardless of the absolute value of this variance.

Moreover, it should be observed that the example of FIG. 2 and the otherdescribed embodiments can be implemented by avoiding a comparison of thecurrent traffic with a corresponding statistical model based onhistorical data, which is typical of the prior art methods. It has to benoticed that if the estimated statistical model of the traffic does notreflect the normal traffic behavior sufficiently accurately, then thefalse-positive rate will be high, provided that the thresholds arechosen so as to achieve a reasonably low false-negative rate.Alternatively, if the thresholds are chosen so as to achieve areasonable low false-positive rate, then the false-negative rate will behigh. Therefore, the possibility of performing a detection irrespectiveof statistical models, improves the reliability of the describeddetection methods in comparison with prior art techniques (for example,see the above mentioned article “Load characterization and anomalydetection for voice over IP traffic”, M. Mandjes, I. Saniee, and A. L.Stolyar).

Furthermore, the methods described above are mathematically relativelysimple, sufficiently robust to changes inherent to normal traffic, andyet capable of detecting anomalous traffic due to attacks such as (D)DoSattacks, SPAM and SPIT attacks, and scanning attacks, as well as massivemalicious software attacks. For example, the detection methods describedabove do not need complex computations in contrast with the waveletcomputation disclosed in the above cited paper “DDoS detection andwavelets”, L. Li and G. Lee. As such, they are suitable to be applied inhigh-speed and high-volume communication networks.

In general, in the following claims, the terms used should not beconstrued to limit the claims to the specific embodiments disclosed inthe specification and the claims, but should be construed to include allpossible embodiments along with the full scope of equivalents to whichsuch claims are entitled. Accordingly, the claims are not limited by thedisclosure.

1-25. (canceled)
 26. A method of detecting anomalies in a communicationsystem, comprising: providing a first packet flow portion and a secondpacket flow portion; extracting samples of a numerical featureassociated with a traffic status of the first and second packet flowportions; computing from said extracted samples a first statisticaldispersion quantity and a second statistical dispersion quantity of thenumerical feature associated with the first and second packet flowportions, respectively; computing from said dispersion quantities avariation quantity representing a dispersion change from the firstpacket flow portion to the second packet flow portion; comparing thevariation quantity with a comparison value; and detecting an anomaly inthe system in response to said comparison.
 27. The detection method ofclaim 26, wherein said first statistical dispersion quantity is a firstvariance of the numerical feature associated with the first packet flowportion and said second statistical dispersion quantity is a secondvariance of the numerical feature associated with the second packet flowportion.
 28. The detection method of claim 26, wherein said variationquantity is related to a difference between the first statisticaldispersion quantity and the second statistical dispersion quantity. 29.The detection method of claim 28, wherein said variation quantity is asquared difference of the first statistical dispersion quantity and thesecond statistical dispersion quantity.
 30. The detection method ofclaim 26, wherein extracting samples of a numerical feature comprisesselecting the numerical feature from a plurality of features comprising:packet size in bytes; total number of packets in a time interval oflength; total number of layer 3 bytes in a time interval of length;average packet size in a time interval of length, expressed in bytes;packet rate in a time interval of length; and byte rate in a timeinterval of length.
 31. The detection method of claim 27, whereincomputing each of said first variance and second variance comprises:computing a summation of the samples associated with one of said firstand second packet flow portions; computing a mean value of the samplesassociated with one of said first and second packet flow portions;computing a summation of the squared distances from the mean value ofthe samples associated with one of said first and second packet flowportions; and computing each of said first and second variances from thecorresponding summation of the squared distances from the mean value.32. The detection method of claim 26, wherein providing said first andsecond packet flow portions comprises: defining a first time windowcomprising the first packet flow portion and an associated first samplesegment of the numerical feature; and defining a second time windowcomprising the second flow portion and an associated second samplesegment of the numerical feature, wherein said first statisticaldispersion quantity and said second statistical dispersion quantity arecomputed from the first and second sample segments, respectively. 33.The detection method of claim 32, wherein the first and second windowshave a same time length.
 34. The detection method of claim 32, whereinthe second window is shifted in time with respect to the first window bya delay.
 35. The detection method of claim 34, further comprising aftera time interval equal to said delay: defining further first and secondwindows by sliding the first and second windows by said delay; andrepeating the method to detect an anomaly applying the method to furtherfirst and second packet flow portions corresponding to said furtherfirst and second windows, respectively.
 36. The detection method ofclaim 32, wherein the first sample segment comprises an initial part ofthe second sample segment, the second sample segment comprising an endpart which is separate from the first segment.
 37. The detection methodof claim 36, further comprising, after a time interval equal to a delay:defining further first and second sample segments by sliding the firstand second sample segments by said delay; and repeating the method todetect an anomaly applying the method to further first and second samplesegments.
 38. The detection method of claim 32, wherein the second timewindow comprises a fixed initial time point coincident to a fixedinitial time point of the first time window and an end time pointobtained by extending the first time window by an advancing time value.39. The detection method of claim 26, further comprising: extractingfurther samples of a further numerical feature associated with a trafficstatus of the first and second packet flow portions; computing from saidfurther samples additional statistical dispersion quantities of saidfurther numerical feature associated with the first and second packetflow portions; and computing a further variation quantity representinganother dispersion change from the first packet flow portion to thesecond packet flow portion.
 40. The detection method of claim 39,wherein computing from said dispersion quantities a variation quantity,representing a dispersion change from the first packet flow portion tothe second packet flow portion, comprises: computing a first variationquantity from said dispersion quantities; and combining the firstvariation quantity and the further variation quantity to obtain saidvariation quantity.
 41. The detection method of claim 39, whereincomparing the variation quantity with a comparison value furthercomprises: comparing the further variation quantity with a furthercomparison value; and detecting an anomaly in the system in response tosaid comparison of the further variation quantity with the furthercomparison value.
 42. The detection method of claim 26, wherein saidfurther statistical dispersion quantity is a first minimum mean squareddeviation from any affine approximation of numerical feature valuesassociated with the first packet flow portion and said secondstatistical dispersion quantity is a second minimum mean squareddeviation from any affine approximation of numerical feature valuesassociated with the second packet flow portion.
 43. The detection methodof claim 32, wherein computing the second statistical dispersionquantity comprises: updating the computed first statistical dispersionquantity taking into account samples of the second segment not includedin the first segment.
 44. The detection method of claim 26, furthercomprising: selecting the comparison value from: a fixed value, avariable value, an adaptive value, and a value depending on historicaltraffic data.
 45. The detection method of claim 26, wherein the detectedanomaly is due to at least one of the following causes: a failure of acommunication system element and an attack.
 46. The detection method ofclaim 26, further comprising: aggregating samples of the numericalfeature values of different network flows according to selected packetparameters; and applying the method to said aggregated samples.
 47. Anapparatus capable of detecting anomalies in a packet switchedcommunication system, comprising: a collection module capable of storingsamples of a numerical packet feature associated with traffic status ofa first packet flow portion and a second packet flow portion; acomputing module capable of being arranged so as to: compute from saidsamples a first statistical dispersion quantity and a second statisticaldispersion quantity of the numerical feature associated with the firstand second packet flow portions, respectively; and compute from saiddispersion quantities a variation quantity representing a dispersionchange from the first packet flow portion to the second packet flowportion; and a detection module arranged so as to: compare the variationquantity with a comparison value; and detect an anomaly in the system inresponse to said comparison.
 48. The apparatus of claim 47, furthercomprising a flow aggregation module for grouping numerical packetfeature values of different network flows according to selected packetparameters.
 49. A packet switched communication system comprising: anextractor module capable of extracting samples of a numerical featureassociated with a traffic status of a first packet flow portion and asecond packet flow portion; and an apparatus capable of detectinganomalies connected to said extractor module and arranged in accordancewith the apparatus capable of detecting anomalies in a packet switchedcommunication system of claim
 47. 50. A computer program productcomprising program codes capable of performing the detection method ofdetecting anomalies in a communication system according to claim 26.